How a robots.txt file works

What is Robots.txt file and how to create a perfect Robots.txt file

Robots.txt file is a small text file that resides in the root folder of your site. It tells the search engine bots which part of the site to crawl and index and which part not.

If you make even a slight mistake while editing/customizing it, then the search engine bots will stop crawling and indexing your site and your site will not appear in the search result.

In this article, I will tell you what is Robots.txt file and how to create a perfect Robots.txt file for SEO.

Why Robots.txt is necessary for the website

When search engine bots come to the website and blog, they follow the robots file and crawl the content. But if your site does not have Robots.txt file, then the search engine bots will start indexing and crawling all the content of your website which you do not want to index.

Search engine bots look for robots file before indexing any website. When they do not get any instructions from Robots.txt file, then they start indexing all the content of the website. And if they get any instructions, then they index the website by following them.

So, due to these reasons, Robots.txt file is required. If we do not give instructions to search engine bots through this file, then they index our entire site. Along with this, they also index some such data which you did not want to index.

Benefits of Robots.txt file

Search engine tells bots which part of the site to crawl and index and which part not.
A particular file, folder, image, pdf etc. can be prevented from being indexed in search engine.
Sometimes search engine crawlers crawl your site like hungry lions, which affects your site
performance. But you can get rid of this problem by adding crawl-delay to your robots file.
Although Googlebot does not accept this command. But you can set the crawl rate in Google
Search Console. This prevents your server from getting overloaded.
You can make the entire section of any website private.
You can prevent the internal search results page from showing in SERPs.
You can improve your website SEO by blocking low quality pages.

Where does the Robots.txt file reside in the website?

If you are a WordPress user, it resides in the root folder of your site. If this file is not found in this location, then the search engine bots start indexing your entire website. Because search engine bots do not search your entire website for Robots.txt file.

If you do not know whether your site has a robots.txt file or not? So in the search engine address bar, you just have to type this – example.com/robots.txt

https://www.knowledgeplus.tech/2024/11/How a robots.txt file works.html

A text page will open in front of you as you can see in the screenshot.

This is the robots.txt file of knowledge.tech. If you do not see any such txt page, then you have to create a robots.txt file for your site.

Apart from this, you can check it by going to Google Search Console tools.

Basic Format of Robots.txt File

The basic format of Robots.txt file is very simple and it looks something like this,

User-agent: [user-agent name]

Disallow: [URL or page you do not want to crawl]

These two commands are considered a complete robots.txt file. However, a robots file can have many commands of user agents and directives (disallows, allows, crawl-delays etc.).

User-agent: Search Engines are Crawlers/Bots. If you want to give the same instruction to all search engine bots, then use the * symbol after User-agent:. For example – User-agent: *

Disallow: This prevents files and directories from being indexed.

Allow: This allows search engine bots to crawl and index your content.

Crawl-delay: How many seconds the bots have to wait before loading and crawling the page content.

Preventing all web crawlers from indexing the website

User-agent: *

Disallow: /

By using this command in the Robots.txt file, you can prevent all web crawlers/bots from crawling the website.

Allowing all web crawlers to index all content

User-agent: *

Disallow:

This command in Robots.txt file allows all search engine bots to crawl all pages of your site.

Blocking a Specific Folder for Specific Web Crawlers

User-agent: Googlebot

Disallow: /example-subfolder/

This command only stops Google crawler from crawling example-subfolder. But if you want to block all crawlers, then your Robots file will be like this.

User-agent: *

Disallow: /example-subfolder/

Preventing a Specific Page (Thank You Page) from being indexed

User-agent: *

Disallow: /page URL (Thank You Page)

This will prevent all crawlers from crawling your page URL. But if you want to block specific crawlers, then write it like this.

User-agent: Bingbot

Disallow: /page URL

This command will only prevent Bingbot from crawling your page URL.

Adding Sitemap in Robots.txt File

Sitemap: https://www.example.com/sitemap.xml

You can add your sitemap anywhere in robots.txt – at the top or at the bottom. Here is a guide – How to Add Sitemap in Robots.txt File and Why it is Important?

You can comment for any kind of question or suggestion related to this article. If this article proved helpful for you, then don’t forget to share it!

Knowledge Plus

Search This Blog